Genome-wide association mapping for component traits of drought tolerance in dry beans (Phaseolus vulgaris L.)

Understanding the genetic basis of traits of economic importance under drought stressed and well-watered conditions is important in enhancing genetic gains in dry beans (Phaseolus vulgaris L.). This research aims to: (i) identify markers associated with agronomic and physiological traits for drought tolerance and (ii) identify drought-related putative candidate genes within the mapped genomic regions. An andean and middle-american diversity panel (AMDP) comprising of 185 genotypes was screened in the field under drought stressed and well-watered conditions for two successive seasons. Agronomic and physiological traits, viz., days to 50% flowering (DFW), plant height (PH), days to physiological maturity (DPM), grain yield (GYD), 100-seed weight (SW), leaf temperature (LT), leaf chlorophyll content (LCC) and stomatal conductance (SC) were phenotyped. Principal component and association analysis were conducted using the filtered 9370 Diversity Arrays Technology sequencing (DArTseq) markers. The mean PH, GYD, SW, DPM, LCC and SC of the panel was reduced by 12.1, 29.6, 10.3, 12.6, 28.5 and 62.0%, respectively under drought stressed conditions. Population structure analysis revealed two sub-populations, which corresponded to the andean and middle-american gene pools. Markers explained 0.08–0.10, 0.22–0.23, 0.29–0.32, 0.43–0.44, 0.65–0.66 and 0.69–0.70 of the total phenotypic variability (R2) for SC, LT, PH, GYD, SW and DFW, respectively under drought stressed conditions. For well-watered conditions, R2 varied from 0.08 (LT) to 0.70 (DPM). Overall, 68 significant (p < 10−03) marker-trait associations (MTAs) and 22 putative candidate genes were identified across drought stressed and well-watered conditions. Most of the identified genes had known biological functions related to regulating the response to drought stress. The findings provide new insights into the genetic architecture of drought stress tolerance in common bean. The findings also provide potential candidate SNPs and putative genes that can be utilized in gene discovery and marker-assisted breeding for drought tolerance after validation.


Introduction
Common bean (Phaseolus vulgaris L., 2n = 2x = 22) is one of the major pulse crops consumed worldwide with a relatively small diploid genome size of approximately 473 Mb [1]. It is a cheap source of proteins and important micronutrients such as iron (Fe) and zinc (Zn) for millions in many African and Latin American countries [2,3]. Beebe et al. [4] reported that Sub-Saharan Africa (SSA) and Latin America produce the largest volume of common beans, representing more than 60% of the world's bean production. Common bean was subjected to two parallel domestication events on the American continent, resulting in two different primary gene pools namely the andean and the middle-american [5,6]. The andean gene pool originated from the Andes mountains of South America and consists of medium (25-40 grams per 100 seeds) or large (� 40 grams per 100 seeds) seeded genotypes [7]. On the other hand, the middle-american gene pool is native to Central America and Mexico, and comprises of small seeded genotypes (� 25 grams per 100 seeds). According to Bitocchi et al. [8], there is more genetic variation within the middle-american gene pool compared to the andean gene pool.
Common beans are notably sensitive to climatic and environmental variations. This is aggravated by the fact that most bean growing regions in the world experience different production constraints including intermittent and terminal drought stress which adversely affect grain yield [9][10][11][12]. As reported by Katungi et al. [13], 73% of common beans production in SSA occurs in environments which experience moderate to severe drought stress. Beebe et al. [4], Hoyos-Villegas et al. [14] and Valdisser et al. [15] reiterated that drought stress is the most important grain yield-limiting abiotic factor of dry beans worldwide. It is predicted from various climate models that the duration and frequency of droughts are expected to increase in SSA [16]. Drought stress reduces stomatal conductance, total chlorophyll content, leaf expansion, number of days to physiological maturity, seed yield and biomass, number of pods and seeds per plant, seed size and harvest index [17][18][19][20][21][22]. According to Asfaw et al. [23], severe drought stress can result in grain yield losses of up to 80%. In Zimbabwe, grain yield reductions of more than 50% were reported by Mutari et al. [24] under terminal drought stress.
As reported by Mutari et al. [25], dry beans farmers in Zimbabwe have been using different mitigation strategies to minimize grain yield losses due to terminal drought stress. These strategies include soil mulching, ridging, cultivating the soil to retain more moisture and reducing the area under the bean crop. However, host plant resistance is a more sustainable, environmentally friendly and labour saving technology for managing drought stress in common beans compared to the multiple cultural practices. For this reason, most dry beans breeding programmes aim to introduce drought tolerance into new cultivars to address the needs and preferences of smallholder farmers in the face of climate change [26].
Understanding the underlying genetic architecture of agronomic and physiological traits under drought stressed and well-watered conditions is a prerequisite for the genetic improvement of these traits using MAS. Thus, dissecting the genetic basis of multiple polygenic traits of economic importance such as drought tolerance with respect to the genomic regions and/or genes involved and their effects is important. It is particularly important in improving genetic gains when breeding for superior grain yield in dry beans under drought stressed and wellwatered environments. This can be accomplished through complementary approaches such as GWAS and genomic prediction models [6]. Genome wide association study is a powerful tool for characterizing the genetic basis of quantitative traits at high level of genetic resolution [34,[37][38][39][40][41]. It is also a powerful tool for identifying multiple candidate genes (marker alleles) associated with variation in quantitative traits (marker-trait associations; MTA) of interest in crop species using high density DNA markers [34,[37][38][39][40][41].
Genome wide association study is also known as association mapping (AM) or linkage disequilibrium (LD) mapping [42]. It is based on linkage LD and historical recombination events of alleles of detected quantitative trait loci (QTL) at relatively high level of genetic resolution [43,44]. The high level of genetic or mapping resolution is due to high genetic variability and high number of recombination events in diverse natural population such as landraces, elite breeding lines and improved cultivars [43][44][45]. The historical recombination events would have naturally occurred during the evolution and domestication of the crop, and crop improvement (several generations) [33]. Therefore, it is inexpensive and reduces research time (no need to develop a mapping population) with greater allele numbers. The identification of genomic regions and diagnostic genetic markers associated with grain yield and yield-attributing traits under drought stressed and well-watered conditions will facilitate trait introgression and MAS.
Genome wide association study has been successfully used to detect MTAs and QTLs in common beans. Several QTLs associated with disease and insect pest tolerance have been identified in dry beans [32,[46][47][48][49][50]. Similarly, MTAs were identified for drought tolerance traits in dry beans [14,15,[51][52][53]. Also, MTAs were identified for nutritional composition-related traits [6,33], symbiotic nitrogen fixation [54], cooking time [55] and photosynthetic traits [34,56] in dry beans. Genomic regions governing agronomic traits in drought stressed and high yield potential environments were also identified in dry beans [1,6,14,34,57]. Thus, several significant MTAs have been identified in previous GWAS studies for agronomic traits in drought stressed environments. However, the use of very low thresholds (-log 10 p-value � 3.0) in most of the studies in determining significant MTAs might have resulted in many false positives. In addition, despite the fact that several QTLs/MTAs associated with agronomic traits have been identified in dry beans, further genetic studies are required using different genetic backgrounds to reach a saturation point. Moreover, most of the reported putative genes for agronomic and physiological traits were detected under yield potential environments.
Additionally, some of the previous mapping studies [14,17,51,[58][59][60][61] conducted on agronomic and physiological traits used a small population size and a limited number of molecular markers. This resulted in QTL with low resolution or poor estimation of marker effects. Furthermore, this also made it difficult to make inferences on putative candidate genes correlated with the identified QTL. In addition, some of the previously identified QTLs explained low total genetic variance [23]. Moreover some of the previously identified QTLs were sometimes not stable across environments due to genotype by environment interaction (GEI) [52]. Thus their potential for MAS in developing genotypes that are tolerant to drought stress was inconclusive. Therefore, additional studies are required to dissect the genetic basis of agronomic and physiological traits in dry beans under drought stressed and optimal environments for increased genetic gains. The objectives of this study were: (i) to identify single nucleotide polymorphism (SNP) markers significantly associated with agronomic and physiological traits for drought tolerance and; (ii) to identify drought-related putative candidate genes associated with traits within the mapped genomic regions.

Description of the study location
The field experiments (drought stress and well-watered) were conducted at the screening site for drought stress tolerance located at Save Valley Experiment Station (SVES), Zimbabwe. The experiments were carried out during the 2019 and 2020 dry winter seasons (April-July). Save Valley Experiment Station is characterised by clay soils and is located in the drier lowveld region of Zimbabwe where dry beans are commercially produced during the dry winter season ( Table 1). The research station receives an average annual rainfall of 450 mm which is usually distributed between the months of December and April. In both seasons, no precipitation was received during the trial evaluation period. Historically, SVES presents few rainfall occurrences during the dry winter season [24]. Daily temperatures (˚C) and relative humidity (%) were recorded with a digital weather station (Table 1) during the growing seasons. More details on the agro-ecological characteristics of SVES are outlined in Table 1.

Germplasm
A total of 185 dry beans genotypes constituted the andean and middle-American diversity panel (AMDP). The AMDP comprised of landrace collections (25), released cultivars (18) and elite breeding lines (142) of different market classes such as sugars, calimas, small whites, large whites and large red kidneys (S1 Table). The genotypes were sourced from public and private breeding institutions located in different geographic regions. These included the Alliance of Bioversity International and International Center for Tropical Agriculture (ABC) in Colombia (87), ABC in Malawi (67), ABC in Uganda (18), Ethiopian Institute of Agricultural Research (EIAR) in Ethiopia (3), Crop Breeding Institute in Zimbabwe (6) and Seed-Co, also in Zimbabwe (4) (S1 Table). The elite breeding lines were selected on the basis of their genetic divergence and breeding history for drought tolerance. On the other hand, the released cultivars and landrace collections were included in the study to broaden the genetic diversity for agronomic and physiological traits. Furthermore, the released cultivars were included in the study due to their great economic importance to the Zimbabwean dry bean breeding program. The details of the genotypes used in the study are presented in S1

Field phenotyping of the diversity panel
Experimental design, irrigation scheduling and trial management. The AMDP was evaluated side by side under drought stressed and well-watered treatment conditions. In both seasons, the genotypes in both drought stressed and well-watered treatments were established in a 5 x 37 alpha lattice design with two replications. The seepage of water from the wellwatered treatment to the drought stressed treatment was minimized by maintaining a 30 m buffer zone between the two treatments. Each genotype was hand planted in four-row plots of 3 m in length, and an inter-row spacing of 0.45 m. Compound D (N = 7%, P = 14%, K = 7%) was applied at planting at a rate of 300 kg/ha. Ammonium nitrate (34.5% N) was applied in both drought stressed and well-watered treatments as a top-dressing fertilizer thirty days after emergence at a rate of 100 kg/ha. An overhead sprinkler irrigation system was used to irrigate both drought stressed and well-watered treatments during both seasons of evaluation.
The irrigation cycles in both drought stressed and well-watered treatments were as described by Mutari et al. [24]. The amount of soil moisture was monitored using tensiometers (2 per station) which were installed (0.5 m and 1 m depths) at four different stations in each treatment. Soil moisture was kept at field capacity during the crop development period in the well-watered treatment. In the drought stressed treatment, soil moisture was kept at field capacity until when 80% of the plants had flowered. After that, terminal drought stress was imposed up to physiological maturity by withholding irrigation water to 30% of the field capacity before re-irrigating on the basis of readings from tensiometers. Six cycles of irrigation amounting to 256 mm and 253 mm, respectively (each cycle with roughly 42 mm) were applied to the drought stressed experiments during the 2019 and 2020 seasons. On the other hand, 10 cycles of irrigation were applied to the well-watered experiments in 2019 (424 mm) and 2020 (418 mm), each cycle with roughly 42 mm). In both seasons, recommended agronomic practices were followed for the management and control of pests such as diseases, insects and weeds.
Collection of data on agronomic and physiological traits. At the flowering stage of growth, the number of days from planting to 50% flowering (DFW) were recorded in both treatments. The DFW was recorded when 50% of the plants in a plot had at least one or more open flowers. At mid-pod filling, leaf temperature (LT;˚C), stomatal conductance (SC; mmol m -2 s -1 ) and leaf chlorophyll (LCC) content were collected on all genotypes in both drought stressed and well-watered treatments. The LT and SC data were recorded from the surface of the uppermost fully expanded young leaf. This was done between 11:00 am to 14:00 pm using a FLUKE precision infrared thermometer (Everest Interscience, Tucson, AZ, USA) and a hand-held leaf porometer (Decagon Devices 1 , Pullman, WA, USA), respectively. Three readings were collected on three different randomly chosen plants from each plot per replicate in both drought stressed and well-watered treatments. The three measurements were averaged to obtain one final reading per plot. Phenotyping for LT and SC was done for an average of six days on clear, sunny days with minimal wind. Regarding the LCC, this was measured using a soil and plant analysis development (SPAD) chlorophyll meter (SPAD-502Plus, Konica-Minolta, Osaka, Japan) on two fully developed leaves of three plants in each plot. Then, the average value was calculated.
At physiological maturity, the following traits were recorded from the two inner rows from every plot for every genotype in both treatments and seasons: plant height (PH; cm), days from planting to physiological maturity (DPM), grain yield (GYD; kg/ha) and 100-seed weight (SW; g). Plant height which was averaged from three plants per plot was measured from the base of the plant (soil surface) to the top node bearing at least one dry pod.. The DPM were recorded as the average number of days from planting to when 95% of pods in a plot lost their green colour. Grain yield was recorded from the two middle rows in each plot using a weighing scale. Grain yield was subsequently converted to kilograms per hectare (kg/ha) at 12.5% moisture basis. The SW was determined using a beam balance weighing scale by measuring the weight of 100 seeds that had been selected randomly from each plot harvest. Genotypes which combined high grain yield (above the trial mean) with desirable values for stomatal conductance (high), leaf chlorophyll (high), canopy temperature (low), physiological maturity (early) and seed size (large) under drought stressed conditions were presumed to be tolerant to drought stress [22].

Statistical analysis of phenotypic data
Before conducting analysis of variance, normality tests were conducted in Genstat Discovery 18th Edition [62] using residuals of the agronomic and physiological traits. The agronomic and physiological traits were analysed in Genstat Discovery 18th Edition [62] using mixed models from which the best linear unbiased predictors (BLUPs) were obtained. The BLUPs were estimated for the studied traits to minimize the environmental and seasonal effects. The BLUPs for each entry were estimated through individual environment (drought stressed or well-watered) analysis, and by combined analysis (across water regimes). In the first step of analysis (single-environment analysis), the phenotypic data of each individual environment were analysed using a mixed linear model (MLM). In this model, blocks and genotypes were treated as random effects, and replications were considered as fixed effects. Genotype effects were declared to be random to enable the calculation of BLUPs and broad-sense heritability (H 2 ). The MLM presented below was fitted: where Y jkl = is the phenotypic observation of the genotype i in replicate j in block l within replicate j, μ = grand mean effect, g i = random effect associated with genotype i, r j = fixed effect associated with replicate j, b lj = random effect associated with block l nested within replicate j, and e ijl = residual effect associated with observation ijl. For a combined or multi-environment analysis, a MLM was used. In this model, blocks nested within replications, replicates nested within environments, genotypes and their interactions with environments (GEI) were considered as random effects. Environments, defined as year x water regime combination were considered as fixed effects. The MLM presented below was fitted: where Y ijkl = effect of genotype i in environment j and kth replication within environment j and Ith block nested within replicate k and environment j, μ = grand mean, G i = random effect of the ith genotype, E j = fixed effect of the jth environment, R k[j] = random effect associated with the replicate k nested within environment j, B l[jk] = random effect of block l nested within environment j and replicate k, GE ij = random effect of the interaction between genotype i and environment j, and e ijkl = random error associated with observation ijkl. The analysis was performed using the Restricted Maximum Likelihood (REML) method implemented in GenStat 18th edition [62]. Broad-sense heritability estimates for the agronomic and physiological traits were calculated following the formula proposed by Cullis et al. [63]. Heritability was classified as low when less than 30%, moderate when between 30-60% and high when more 60% [64]. Drought intensity index (DII) at the location, percentage GYD reduction (%GYR) due to drought stress, drought susceptibility index (DSI), geometric mean productivity (GMP) and drought tolerance index (DTI) of each entry were calculated as described by Mutari et al. [24]. The drought tolerance indices were used in this study to support the observed phenotypes under well-watered and drought stressed conditions with respect to agronomic and physiological traits. Genotypes with low values for %GYR, DSI and high values for DTI and GMP were presumed to be tolerant to drought stress [24]. Furthermore, a ranking method was used to select superior drought tolerant genotypes by calculating the mean rank of each genotype across all the studied indices.

Genotyping of the diversity panel
Genomic DNA of the 185 genotypes was extracted from young leaves of 2-week old bean plants following the plant DNA extraction protocol for Diversity Arrays Technology (DArT; [65]). A NanoDrop Spectrophotometer (ND-8000, NanoDrop Technologies, Inc.) was used to determine the concentration of the DNA. Agarose gel (1% agarose gel) electrophoresis was used to evaluate the quality of the DNA. The DNA from the samples used in this study were genotyped using the Diversity Arrays Technology Sequencing (DArTseq) protocol using a set of 24,450 silico DArT markers. The DArT markers used were evenly distributed across all the 11 chromosomes of common bean. Genotyping by sequencing (GBS) was done at the Biosciences Eastern and Central Africa (BecA) Hub of the International Livestock Research Institute (BecA-ILRI) in Kenya. The silico DArTs used had polymorphic information content (PIC) values ranging from 0.01 to 0.50, reproducibility values of 1.00, and the proportion of missing data per marker was 7% (mean call rate of 93%, ranging from 81 to 100%). The entire data set of SNP markers was filtered in TASSEL v5.2 [66] to remove SNP loci with unknown physical positions on the common bean genome, monomorphic SNPs, and SNP markers with more than 20% missing data and minor allele frequency (MAF) of less than 5% (<0.05) threshold [15,49,67]. A final total of 9370 (38%) DArTseq-derived SNPs distributed across the 11 chromosomes were retained after filtering for use in association analysis and population structure analysis via principal component analysis (PCA).

Inference of population structure
The genotypic data was imputed for missing alleles of SNPs on the KDCompute online sever (https://kdcompute.igs-africa.org/kdcompute/). This was done using the optimal imputation algorithm to increase the power of the study. KDCompute was also used to graphically visualize the distribution of SNPs across the common bean genome. The population genetic structure was determined based on the Bayesian model-based clustering approach using the Bayesian inference program in STRUCTURE software version 2.3.4 [68]. A subset of additionally filtered SNP markers (4095) at or near Hardy-Weinberg equilibrium (r 2 < 0.8) and that covered the entire genome were used in population structure analysis with STRUCTURE [14,15,31]. This was done to reduce the background and admixture linkage disequilibrium (LD) owing to linked loci [68]. Settings for the STRUCTURE program were set as follows to derive the population structure: a burn-in period length of 10,000, and after burn-in, 10,000 Markov Chain-Monte Carlo (MCMC) repetitions. The number of sub-populations or clusters (K) was set from 1 to 10, with ten independent runs for each K [3,48,55]. The best K-value explaining the population structure was inferred using the Delta K (ΔK) method in Evanno et al. [69] implemented in the on-line tool structure harvester software [70]. Genotypes with ancestry probability/ coefficient � 0.90 (� 90%) (pure genotypes) for the andean sub-population were allocated to the andean gene pool [31,71] (S1 Table). On the other hand, genotypes with ancestry probability � 0.90 for the middle-american sub-population were allocated to the middle-american gene pool. Those with ancestry probability < 0.90 were considered as admixed [71]. The clustering of the AMDP was further assessed and visualized in a 3D scatter plot using PCA in prcomp R 3.0 function [72].

Marker-trait association tests and linkage disequilibrium analyses
The filtered 9370 SNPs and the adjusted trait means (BLUPs) for each of the environments (drought stressed and well-watered) were used as input data in marker-trait association (MTA) analysis. The more conservative compressed mixed linear model (CMLM) procedure in the genome association and prediction integrated tool (GAPIT) (v3) program of R software was used to determine the MTAs following the Q + K model according to Lipka et al. [73]. Phaseolus vulgaris is characterised by a strong genetic structure necessitating the need to use the Q + K model [74]. The CMLM incorporated both the population structure (Q; fixed effect) and kinship (K; random effect) matrices as covariates. This was done to correct the population structure, increase statistical power of the analysis and minimize false positives (spurious MTAs) [67,72,75]. The K matrix was included in the association analysis to correct for cryptic relatedness within the AMDP [54,67]. The threshold for significant MTA was set at p < 0.001 to reduce the risk of false MTAs.
The Manhattan plots drawn using the CMplot package in R 3.5.3 were used to visualise the significant MTAs for each environment. The p-values were plotted as-log 10 (p) to generate the Quantile-Quantile (Q-Q) and Manhattan plots using the CMplot package in R package [76]. The Q-Q plots were produced from the observed and expected logarithm of the odds (LOD) scores for each trait. The GAPIT program of R software was used to deduce linkage disequilibrium. The LD Heatmap package in R 3.0 was used to generate the LD Heatmaps for the significant markers of each trait [77,78]. Alleles with positive additive effects resulting in higher values of GYD, SZ and LCC were described as "superior alleles" under both drought stressed and well-watered conditions. On the other hand, alleles resulting in decreased GYD, SZ, and LCC were "inferior alleles". In addition, alleles with negative effects resulting in lower values of DFW, DPM, LT and SC were considered to be "superior alleles" under drought stressed conditions.

Putative candidate gene prediction
The Jbrowse feature on Phytozome v13 was used to browse the P. vulgaris G19833 v2.1 reference genome sequence [1]. This was done to gain insight into potential putative candidate genes associated with significant SNPs for each trait. The functional annotation of the gene was checked on Phytozome v13 website (http://phytozome.net) to postulate the role of the gene in the control of a target trait. Plausible candidate genes were identified based on the window size of 200 kb (maximum ± 100 kb) on either side (upstream and downstream) of the significant marker [74,79]. The window size of 200 kb is the average LD [74,79]. A gene was considered a potential candidate using the following criteria: (i) if the gene contained a significant SNP or the gene contained a SNP that was in LD with a significant SNP [3], and (ii) if the gene had a known role related to regulating drought stress response, plant growth and development under water deficit based on gene ontology term descriptions in Phytozome v13. For the positional candidate genes that did not have adequate functional annotation information on Phytozome v13, the sequence data of the significant SNP was used against the NCBI database. This was done using the basic local alignment search tool for nucleotide (BLASTn; https://blast.ncbi.nlm.nih.gov/smartblast/smartBlast.cgi).

Variations of agronomic and physiological traits under two water regimes
The descriptive statistics and H 2 estimates for the agronomic and physiological traits under drought stressed and well-watered environments are shown in Table 2. Residual maximum likelihood analysis revealed highly significant (p < 0.001) genotypic main effects on all the studied traits under both environments. This supported the use of the AMDP for GWAS purposes. Overall, phenotypic variability was observed among the genotypes for DFW, LCC, LT, SC, PH, DPM, GYD and SW under drought stressed and well-watered conditions (S1 and S2  Combined GYD data over two seasons across environments revealed that the highest yielding genotype was G184 (DAB91-2222,7 kg/ha) followed by G176 (DAB302-2097.5 kg/ha) and G147 (CIM-SUG07-ALS-S1-3-2080,1 kg/ha) ( Table 3).
The genetic structure result of the AMDP was verified with the PCA based on SNP marker data and is illustrated by a 3D scatter plot (Fig 1C). The first principal component (PC) accounted for more than 55% of the observed genotypic variability in the AMDP (Fig 1D). On the other hand, the second and third PCs separately accounted for less than 5% of the overall genetic variance in the AMDP (Fig 1D). The PCA also divided the genotypes into two distinct clusters (andean and middle-american sub-populations) as were found with STRUCTURE output (Fig 1C). Furthermore, the andean-middle american admixed genotypes (positioned between the two groups) were isolated from the andean and middle-american sub-groups by PCA (Fig 1C).

Analysis of marker-trait associations under drought stressed conditions
The significant MTAs and their respective statistical parameters for agronomic and physiological traits are summarised in Table 4. In this study, the threshold for significant MTA was set at p < 0.001 to reduce the risk of false MTAs. Under drought stressed conditions, 29 significant MTAs were identified for six traits (excluding DPM and LCC) with p < 10 −03 . The associations are shown in Fig 2. The quantile-quantile (QQ) plots for the studied traits revealed that the expected and observed probability values were normally distributed (S3 Fig). The highest number of significant MTAs were observed on P. vulgaris (Pv) chromosome Pv11 (28%), followed by Pv8 (17%), with the least on chromosomes Pv6 and Pv4, both with 3%. No significant associations for DPM and LCC were identified under drought stressed conditions in this study. The highest number of significant MTAs were identified for PH (15), and the SNPs were distributed across six different chromosomes (Pv1, Pv5, Pv7, Pv8, Pv10, and Pv11). Additionally, the allele effect of these SNPs ranged from -16.03 cm (SNP 8198531) to 17.82 cm (SNP 100101387).
Four SNPs (SNPs 2362591, 2362591, 45231105, and 40802478) that have a significant association with GYD were also identified. These were located on chromosomes Pv4 and Pv11, with allele effect ranging from -174.56 kg/ha (SNP 3381526) to 202.90 kg/ha (SNP 3382688). Notably, 75% of the SNPs that were significantly associated with GYD were located on chromosome Pv11. The sum of the SNPs with a significant positive effect on GYD was 341,88 kg/ ha and -351,23 kg/ha for all the SNPs with a significant negative effect on GYD (Table 4). For SW, two SNPs that were significantly associated with this trait were identified on chromosomes Pv03 and Pv08. These two SNPs had allelic effects ranging from -2.41 g per 100 seeds (SNP 3383047) to 4.46 g per 100 seeds (SNP 16647170). Regarding physiological traits, SNPs were identified that have a significant association with LT. The identified SNPs were distributed across two chromosomes (Pv6 and Pv8), with allele effect ranging from -1.23˚C (SNP 100065202) to 1.34˚C (SNP 100106140).
Notably, two SNPs on chromosomes Pv1 and Pv2 were significantly associated with SC.

Analysis of marker-trait associations under well-watered environments
The significant MTAs and their respective statistical parameters for agronomic and physiological traits are summarised in Table 5. Under well-watered conditions, 39 significant MTAs were detected for six traits (excluding SW and SC) with p < 10 −03 . The associations are shown in Fig 3. The quantile-quantile (QQ) plots for the studied traits revealed that the expected and observed probability values were normally distributed (S4 Fig). The highest number of       Notably, one SNP (SNP 100124606) on chromosome Pv01 was significantly associated with GYD. This SNP had a large positive allelic effect of 199.11 kg/ha. In addition, this SNP had a MAF of 0.17 in the population. Regarding physiological traits, SNPs were identified that have a significant association with LCC. The identified SNPs were distributed across two chromosomes (Pv6 and Pv8). Furthermore these SNPs had positive allele effects ranging from 1.90 (SNP 100167635) to 2.18 (SNP 8198945). As for LT, nine significant associations were detected, with markers accounting for 0.08-0. 15 Table 5).
Well-watered environments. A total of fourteen potential candidate genes (DFW-4; DPM-1; LCC-2; PH-5; LT-2) were identified under well-watered environments (Table 5 and Fig 3). The candidate genes for DFW were identified on chromosomes Pv03 (Phvul.003G011400), Pv04 (Phvul.004G037700), Pv07 (Phvul.007G144000) and Pv11 (Phvul.0011G166300) ( Table 5). On the other hand, the candidate gene for DPM was identified on chromosome Pv02 (Phvul.002G112700) ( Table 5). Candidate genes for DFW had diverse putative functions related to SORTING NEXIN-13, transcription factor TCP 13, U6 SNRNAassociated SM LIKE PROTEIN LSM4 and NHL domain containing protein. On the other hand, the candidate gene for DPM had a putative function related to the activity of thiol disulphide oxidoreductase. Chromosomes Pv4 and Pv5 harboured the two candidate genes for LT namely Phvul.004G055500 and Phvul.005G077500, respectively (Table 5). These genes had diverse putative functions related to the mitochondrial transcription termination factor family protein and leucine rich repeat protein associated with apoptosis in muscle tissue, respectively.

Linkage disequilibrium analysis using significant SNP markers
The analysis of LD using SNP markers is shown in Fig 4. A high and extensive LD was observed for the common bean genome. This is expected in self-pollinated crops such as common bean. The results show that the overall LD decay across the genome of 185 common bean genotypes was 30 bp, at a cut-off of r 2 = 0.4. Generally, there was a slow decay of LD throughout the common bean genome. The LD extended to several mega-bases as shown in Fig 4. The population structure usually affects the extent of LD decay.

Terminal drought stress adaptability analysis
Breeding for enhanced grain yield under both drought stressed and well-watered environments is one of the greatest challenges faced by dry beans breeders [15]. The observed moderate (0.30) DII suggests that drought intensity also depends on the type and genetic diversity of the genotypes used. Similarly, Darkwa et al. [22] reported a DII of 0.30 in different market classes of dry beans. Genotypes such as DAB91, DAB302, AFR703, CIM-SUG07-ALS-51-3, DAB487, DAB287, CIM-RM09-ALS-BSM-12 and DAB539 which exhibited high GMP values might have combined the two mechanisms that contribute to superior grain yield under both study environments. In Sub-Saharan Africa (SSA), breeders prefer genotypes that perform well under both drought stressed and well-watered environments (high GMP). This is because rainfall patterns in SSA differ by season. Furthermore, 24% of the genotypes had negative % GYR and DSI values suggesting that they had higher grain yield under drought stressed compared to well-watered conditions. This indicates that the mechanisms that contribute to superior grain yield performance under drought stressed and well-watered conditions differ. Thus, it might be worthwhile to determine the genetic basis of superior grain yield under drought stressed compared to well-watered environments in these 44 genotypes that distinguished them from the rest. Similar findings were reported by Darkwa et al. [22] in dry beans drought screening experiments.
Terminal drought stress is an important factor limiting common bean productivity in the SSA region. Therefore, the identification and subsequent release of drought tolerant genotypes will positively impact on socio-economic, food and nutrition security in SSA. These genotypes could also serve as important genetic resources in drought tolerance breeding programmes to improve released cultivars. Both DAB287 and AFR703 were released in Zimbabwe as Sweet William and Gloxinia, respectively. Among the genotypes that were less sensitive to drought stress based on their low DSI, %GYR and overall mean ranks across the indices, most of the top 20 genotypes were of the andean gene pool, coded as drought andean (DAB lines) (Table 3  and S2 Table). Notably, all the DAB lines evaluated in this study were developed for improved tolerance to drought by the International Centre for Tropical Agriculture in Colombia. The current observation suggests that progress in improving drought tolerance in the middleamerican gene pool has been limited compared to the andean gene pool. This is in agreement with Assefa et al. [81] who reported that progress in improving drought tolerance in navy beans worldwide has been limited compared to the other commercial classes of small seeded middle-american beans.

Variations in agronomic and physiological traits
The low to moderate H 2 estimates observed for SC, LT and LCC under drought stressed and well-watered conditions imply that these physiological traits might be influenced by a number of genes (polygenic inheritance).. Therefore, direct selection for SC, LT and LCC under drought stressed and well-watered conditions could be a challenge to dry beans breeders. On the other hand, the high H 2 estimates (97%) for seed size observed under drought stressed and well-watered environments reflect the predominance of additive gene action (genetic control of this trait) across environments. The current findings are in agreement with Assefa et al. [80] and Hoyos-Villegas et al. [14] who reported H 2 estimates of 77 and 93.4% for seed size, respectively under well-watered conditions.
Crop plants close their stomata when exposed to drought stress to minimize excessive water loss (avoid dehydration) through transpiration. However, the closing of stomata reduces stomatal conductance. Furthermore italso affects the cooling mechanisms resulting in increased leaf or canopy temperature. Therefore, in this study, drought stress increased LT by 21.6%. Drought stress also reduced GYD by 30%, close to the GYD reductions reported by Schneider et al. [83] [26%], Darkwa et al. [22] [30%] and Mutari et al. [24] [28%] in dry beans drought tolerance screening trials.

Population structure and linkage disequilibrium analysis
The AMDP was delineated into two distinct major sub-populations based on the genotypes' genetic ancestry, and this corresponded to the andean and middle-american gene pools ( Fig  1B and 1C). This is expected considering that the domestication of dry beans on the American continent in two main centres of origin (andean and middle-american regions) resulted in two major and diverse gene pools [59,84]. Cichy et al. [55], Raggi et al. [74], Tigist et al. [48], Nkhata et al. [49], Ojwang et al. [71], and Liu et al. [85] also observed two sub-populations (andean and middle-american gene pools) in their GWAS studies.
A number of the identified andean-middle american admixed genotypes carrying genomic regions from both gene pools are released cultivars. These have been released in Rwanda (RWR2154), Malawi (NUA59-4), Zimbabwe (SMC16, NUA674, and Sweet William) and Eswatini (NUA674) [86][87][88][89]. Furthermore, most of the admixed genotypes have commercial seed types, are biofortified (RWR2154, SMC16, SMC21, NUA674 and NUA59-4) and drought tolerant (Sweet William, DAB115, DAB63, DAB142 and DAB477). Singh [90], Beebe et al. [20] and Beebe [84] reported that interracial hybridizations between races or sister species (Phaseolus coccineus, Phaseolus acutifolius and Phaseolus dumosus) of Phaseolus vulgaris have been widely used in dry beans improvement programmes. These have been widely used when breeding for enhanced grain yield, micronutrient density and drought tolerance. For example, the biofortified admixed genotype NUA674 is a product of an inter-gene pool cross between AND277 (andean gene pool) and G21242 (andean-middle-american inter-gene pool landrace) made at the International Centre of Tropical Agriculture in Colombia [87]. Islam et al. [91] and Beebe [84], also reported that one of the parents to NUA674, G21242 is a product of andean-middle-american inter-gene pool hybridization, validating the current findings. In addition, G21242 is commonly used as a source of high seed iron in biofortification breeding programmes [84,91]. The current observations suggests that most of the admixed genotypes identified in this study resulted from deliberate breeding efforts (inter-gene pool hybridizations) to introgress genes for enhanced grain yield, drought tolerance and micronutrient density. Similar findings were reported by Hoyos-Villegas et al. [14] and Tigist et al. [48] in common bean.
The biofortified and drought tolerant admixed genotypes identified in this study may be used as a bridge to transfer favourable alleles for micronutrient density and drought tolerance into either the andean or middle-american seed types. The extent and structure of LD decay in the study germplasm usually determines the resolution of GWAS. The slow decay of LD observed in this study is expected in self-pollinating crop species, such as common bean. The slow decay is caused by the loss of recombination, which results in a homozygous genetic background. According to Vos et al. [92], recombination events in crops with a homozygous genetic background are ineffective to cause LD decay, resulting in extended (large) and slow decay of LD. The slow decay of LD, and the large extent of LD observed in this study corroborates previous reports in dry beans [32,85].

Marker-trait associations
In dry beans, it is important to enhance drought stress tolerance by identifying genotypes with high grain yield potential under water deficit conditions, and by introgressing desirable alleles conferring drought tolerance. The mean call rate (93%) and reproducibility (100%) of the silico DArTs used in this study were consistent with previous reports [15,49]. This demonstrates the reliability and high quality of this set of silico DArTs. A higher number of significant MTAs were detected under well-watered conditions, corroborating previous reports in bread wheat (Triticum aestivum L.) [93,94] and dry beans [15]. The observed trend could be due to the fact that drought tolerance is a complex polygenic trait which is highly influenced by the production environment. Thisresults in unpredictable performance of genotypes (genotype-by-environment interaction [GEI]) under different environments (drought stressed and wellwatered). Even though a smaller number of significant MTA was observed under drought stressed conditions compared to the well-watered conditions, novel genomic regions associated with key agronomic and physiological traits were detected under drought stressed conditions. Notably, no significant SNPs for all the studied agronomic and physiological traits were consistent across drought stressed and well-watered treatments. Similar findings were reported in wheat ( [93]-plant height and spike length) and dry beans ([15]-grain yield) under drought stressed and well-watered treatments. The observed trend suggests that some markers may influence the expression of phenotypic traits differently under drought stressed and wellwatered environments. Furthermore, the GEI could have confounded the identification of significant SNPs that are consistent across drought stressed and well-watered treatments.
The highest number of significant SNPs were identified for PH. Similar findings were reported by Sukumaran et al. [95] who observed 30 significant MTAs for PH in durum wheat (Triticum turgidum L. ssp. Durum). Some of the SNPs identified in this study were located on genomic regions that had been previously reported to be harbouring genes and QTLs for the studied traits. For example, in this study, chromosomes Pv01, Pv03, Pv04, Pv06 and Pv07 harboured 1 SNP, 4 SNPs, 3 SNPs, 2 SNPs and 1 SNP, respectively that were significantly associated with DFW under optimal conditions. These results are consistent with Dramadri et al. [34] and Nkhata et al. [49]. Dramadri et al. [34] identified 2 QTLs that were associated with DFW on Pv03 under drought stressed and well-watered conditions. Nkhata et al. [49] identified 2 and 5 SNPs that were significantly associated with DFW on Pv03 and Pv06, respectively under well-watered conditions. Furthermore, Keller et al. [6] identified 6 SNPs, 1 SNP and 1 SNP that were significantly associated with DFW on Pv01, Pv04 and Pv07, respectively under optimal conditions. These findings suggest that the aforementioned QTL regions are stable across different environments and genetic backgrounds. In addition, these findings also suggest that chromosomes Pv01, Pv03, Pv04, Pv06 and Pv07 harbour genes for controlling flowering.
In this study, only one marker (SNP 1667170) was significantly associated with SW on chromosome Pv08 under drought stressed conditions. These results are in accordance with Moghaddam et al. [57] who identified significant MTAs for SW on chromosome Pv8 under drought stressed and well-watered environments. The current findingssuggest that this QTL is stable across different environments and genetic backgrounds. On the contrary, several significant MTAs for SW were previously identified under drought stress on chromosome Pv01, [52], chromosome Pv03 [51], chromosome Pv09 [14], and chromosomes Pv2 to Pv4 and Pv6 to Pv11 [15]. Thus, the detection of significant MTAs for SW on different chromosomes and locations indicates high genetic diversity in common bean with respect to genomic regions associated with SW under drought stress. In this study, the identified SNPs that were significantly associated with GYD under drought stressed conditions were located on chromosomes Pv04 (SNP 3382688) and Pv11 (SNP 3384334 and SNP 3381526). Similarly, Dramadri et al. [34] identified significant QTL signals for GYD and yield components on chromosomes Pv01, Pv02, Pv03, Pv04, Pv06, and Pv11 under drought stressed conditions. Oladzad et al. [96] also identified SNPs that were significantly associated with GYD, placed on chromosomes Pv03, Pv08, and Pv11 under heat stress. Furthermore, Valdisser et al. [15] found 25 QTLs that were associated with GYD on chromosomes Pv02, Pv03, Pv04, Pv08, Pv09 and Pv11 under wellwatered conditions, in agreement with the current findings. These findings suggest that chromosomes Pv04 and Pv11 harbour genes for controlling GYD.
The identification of SNPs associated with GYD, under drought stress, would significantly contribute to the development of molecular tools for MAS and identification of genes of interest for edition. The proportion of the total phenotypic variation (R 2 ) explained by the significant SNP markers for LCC and LT was generally low (0.11-0.12 for LCC under well-watered conditions and 8-15% for LT under well-watered conditions). Therefore, to account for the missing variation, it might be worthwhile to complement the SNP-based GWAS by haplotypebased GWAS [97].

Candidate genes
Drought stressed. The functional annotation revealed that the candidate gene for SC, Phvul.001G254100 on chromosome Pv01 encodes the CCCH zinc finger family protein. The CCCH zinc finger family protein plays an important function in response of plants to biotic and abiotic stresses [98][99][100][101]. This functional gene also plays an important role in physiological and plant developmental processes [101]. Similar findings were reported in Brassica rapa [98], common bean [15] and Barley (Hordeum vulgare L.) [101]. Wang et al. [102], Seong et al. [103] and Selvaraj et al. [104] reported that several types of CCCH zinc finger family genes such as O s C 3 H 10 , O s C 3 H 47 , and OsTZF 5 are involved in the regulation of tolerance to drought stress in rice (Oryza Sativa L.). According to Lin et al. [105], the CCCH zinc finger protein gene confers drought tolerance in plants by regulating the opening and closing of stomata. They further reiterated that genotypes that are tolerant to drought stress have abnormal and lower stomatal conductance under drought stressed conditions. In this study, the marker SNP 3380850 for the gene Phvul.001G254100 which confers tolerance to drought stress exhibited negative allelic effects (-10.79 mmol m −2 s −1 ) on SC.
The functional annotation revealed that the candidate gene for DFW, Phvul.002G122100 on chromosome Pv02 encodes an RNA-recognition motif protein. The RNA-recognition motif protein plays a comprehensive biological function (critical modulators) in abiotic stress (drought, heat flooding, cold and high salinity) responding processes in plants [106]. Zhou et al. [107] observed that the RNA-recognition motif gene "OsCBP20" from rice confers abiotic stress tolerance in escherichia coli. Therefore, the candidate gene Phvul.002G122100 identified in this study may play a protective role under drought stressed conditions. Candidate genes such as Phvul.003G263200 (Pv08) for SW which encodes for NADPH dehydrogenase plays an important role in mechanisms which protect plants against nitro-oxidative stresses [108]. The nitro-oxidative stresses are generated by biotic and abiotic stresses such as drought, low temperature, heat, and salinity [108]. Under drought stressed conditions, the seed is significantly affected by oxidative damages, and oxidative damages are minimized by the activity of NADPH dehydrogenase [109].
The candidate gene for GYD, Phvul.004G150500 on chromosome Pv04, encodes the enzyme, phosphoethanolamine N-methyltransferese in plants. This catalytic enzyme plays an important role in the response of plants to abiotic stresses such as drought and salt tolerance [110]. The catalytic enzyme works by catalysing the methylation of phosphoethanolamine to phosphocholine [110]. Studies conducted by Wang et al. [110] in transgenic tobacco revealed that phosphoethanolamine N-methyltransferese improved the drought tolerance of transgenic tobacco. Notably, the marker (SNP 3382688) for this candidate gene Phvul.004G150500 had large positive allelic effects (202.90 kg/ha) on GYD. The candidate gene for PH, Phvul.001G172300 encodes the calcium transporting ATPase [111]. The calcium transporting ATPase plays an important role in growth and development processes, opening and closing of stomata, hormonal signalling, and regulation of responses to biotic and abiotic stresses in plants [111]. In summary, these results further confirmed that the identified putative potential candidate genes were associated with drought stress tolerance of dry beans. Therefore, the putative candidate genes identified in the current AMDP under drought stressed conditions are important genetic resources. The candidate genes could be utilized in drought tolerance breeding programmes by creating and introgressing new genetic variability into commercial cultivars.
Well-watered conditions. The functional annotation revealed that the candidate gene for PH "Phvul.011G152000" on chromosome Pv11 encodes the threonine protein kinase. Threonine protein kinase is associated with enhanced tolerance to biotic and abiotic stresses in plants [15]. Similar results were reported in dry beans by Valdisser et al. [15]. In rice, kinase causes dwarfism by reducing plant height [112]. Similarly, in this study, the marker SNP 100073620 for the gene "Phvul.001G152000" exhibited negative allelic effects (-7.00 cm) on PH. According to Zhang et al. [112], kinases also has an impact on grain yield. The candidate gene Phvul.004G037700 which was found on chromosome Pv04 in association with DFW encodes transcription factor TCP 13 . The transcription factor families are strongly involved in abiotic and biotic stress responses, including zinc-finger, dehydration-responsive elementbinding (DREB), and basic helix-loop-helix (bHLH) families [113]. The basic helix-loop-helix (bHLH) families regulate plant growth in leaves and roots under water deficit conditions [113]. Studies conducted by Urano et al. [113] in Arabidopsis thaliana revealed that TCP 13 induces changes in leaf (leaf rolling and reduced leaf growth) and root morphology (enhanced root growth). This results in enhanced tolerance to dehydration stress under osmotic stress. The candidate gene Phvul.004G055500 which was found in association with LT on chromosome Pv04 encodes mitochondrial transcription termination factor family protein. According to Kim et al. [114], the mitochondrial transcription termination factor family protein enhances thermo-tolerance in Arabidopsis.

Conclusions
This study contributes many significant MTAs in common bean for agronomic and physiological traits under drought stressed and well-watered environments. The present study identified a total of 68 SNPs that were significantly (p < 10 −03 ) associated with key agronomic and physiological traits under drought stressed and well-watered conditions. The highest number of significant MTAs were observed on chromosome Pv11 in both environments. For the two environments (drought stressed and well-watered), no common SNPs for the studied traits were detected. Overall, twenty-two potential candidate genes were identified across environments. Most of the identified genes had known biological functions related to regulating drought stress response, and growth and development under drought stress. The information generated from this study provides insights into the genetic basis of agronomic and physiological traits under drought stressed and well-watered conditions. It also lays the foundation for future validation studies of drought tolerance genes in dry beans. Thus, the significant MTAs identified in this study should be explored and validated further to estimate their effects. This should be done using segregating populations and different genetic backgrounds before utilization in gene discovery and marker-assisted breeding for drought tolerance. Functional characterization and the application of gene knockout to the identified putative candidate genes would further confirm their roles in regulating drought stress response, growth and development under drought stressed and well-watered conditions. More powerful statistical genetics tools such as genomic prediction models would be needed to identify minor genes that are associated with agronomic and physiological traits. The admixed genotypes identified in this study offer potential as genetic resources in drought tolerance and biofortification breeding programmes, especially within the sugar, red mottled and navy bean market classes.
Supporting information S1 Table. Characteristics of common bean genotypes used in the study, their sources and structure membership coefficient (K2) for K = 2.